Algorithm for finding coding signal using homogeneous Markov chains independently for three codon positions

نویسندگان

  • Paweł Błażej
  • Paweł Mackiewicz
چکیده

Many currently used algorithms for protein coding sequences require large learning sets of true genes to estimate sensible values for used parameters which are necessary to make the prediction reasonable. They also fail in recognition of short genes which usually contain weak coding signal. To avoid these problems, we worked out a new algorithm for finding protein coding potential in prokaryotic genomes. This algorithm uses homogeneous Markov chain for modeling nucleotide transition between fixed positions in codons thereby reduces order of Markov chain retaining simultaneously information on dependence between nucleotides in sequence on relatively long distances. We tested performance of this algorithm in relationship to size of the learning set with true and false positive rates for different model orders. We also made some comparisons between our algorithm and commonly used GeneMark. The presented algorithm works better especially for smaller learning sets. Keywords; ORF, gene finding, Markov chains

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for Performance Evaluation of Homogeneous Architectural Styles

Software architecture is considered one of the most important indices of software engineering today. Software Architecture is a technical description of a system indicating its component structures and their relationships, and is the principles and rules governing designing. The success of the software depends on whether the system can satisfy the quality attributes. One of the most critical as...

متن کامل

A New Algorithm for Performance Evaluation of Homogeneous Architectural Styles

Software architecture is considered one of the most important indices of software engineering today. Software Architecture is a technical description of a system indicating its component structures and their relationships, and is the principles and rules governing designing. The success of the software depends on whether the system can satisfy the quality attributes. One of the most critical as...

متن کامل

Bacterial genomes lacking long-range correlations may not be modeled by low-order Markov chains: The role of mixing statistics and frame shift of neighboring genes

We examine the relationship between exponential correlation functions and Markov models in a bacterial genome in detail. Despite the well known fact that Markov models generate sequences with correlation function that decays exponentially, simply constructed Markov models based on nearest-neighbor dimer (first-order), trimer (second-order), up to hexamer (fifth-order), and treating the DNA sequ...

متن کامل

Long Term Behavior of Cyclic Non-Homogeneous Fuzzy Markov Chain

We consider cyclic non homogeneous fuzzy Markov chains where there are uncertainties in the transition possibilities. These uncertainties are modeled by triangular fuzzy number. Using the algorithm for finding the greatest eigen fuzzy sets we have analyzed the long term behavior of the system and this is illustrated with the numerical example. Mathematics Subject Classification: 03E72, 60J10

متن کامل

Stochastic Dynamic Programming with Markov Chains for Optimal Sustainable Control of the Forest Sector with Continuous Cover Forestry

We present a stochastic dynamic programming approach with Markov chains for optimal control of the forest sector. The forest is managed via continuous cover forestry and the complete system is sustainable. Forest industry production, logistic solutions and harvest levels are optimized based on the sequentially revealed states of the markets. Adaptive full system optimization is necessary for co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010